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Abstract — Accurate modeling of the correlation between the 
sources plays a crucial role in the efficiency of distributed source 
coding (DSC) systems. This correlation is commonly modeled in 
the binary domain by using a single binary symmetric channel 
(BSC), both for binary and continuous-valued sources. We show 
that "one" BSC cannot accurately capture the correlation be- 
tween continuous- valued sources; a more accurate model requires 
"multiple" BSCs, as many as the number of bits used to represent 
each sample. We incorporate this new model into the DSC system 
that uses low-density parity-check (LDPC) codes for compression. 
The standard Slepian-Wolf LDPC decoder requires a slight 
modification so that the parameters of all BSCs are integrated in 
the log-likelihood ratios (LLRs). Further, using an interleaver the 
data belonging to different bit-planes are shuffled to introduce 
randomness in the binary domain. The new system has the same 
complexity and delay as the standard one. Simulation results 
prove the effectiveness of the proposed model and system. 

I. Introduction 

Distributed compression of spatially correlated signals, e.g., 
the observations of neighboring sensors in high density sensor 
networks, can drastically reduce the amount of data to be 
transmitted. The efficiency of compression, however, largely 
depends on the accuracy of the estimation of the correlation 
between the sources. The correlation is required at the encoder 
to determine the encoding rate; it is also required to initialize 
the decoding algorithm in the Slepian-Wolf coding schemes 
that use channel codes with iterative decoding, e.g., LDPC 
codes (T]. 

The correlation is unknown at the encoder and is modeled 
by a "virtual" channel. The estimation of the virtual corre- 
lation channel involves modeling it and estimating the model 
parameter 12)-||4). Therefore, if this virtual correlation channel 
is not modeled accurately, even perfect estimation of the model 
parameter cannot guarantee an efficient compression. 

The correlation between the two binary sequences x n and 
y n is commonly modeled by using a binary symmetric channel 
(BSC) with a crossover probability 



p = Pr(y ^ i\x = i), i € {0, 1}. 



(1) 



The parameter p is either assumed to be known at the encoder 
|Q] or needs to be estimated ]2)-|[5]. This model is also 
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widely used in the compression of continuous-valued sources 
where Slepian-Wolf coding |6] is employed to compress the 
sources after quantization. Nevertheless, it is known that the 
correlation between continuous-valued sources can be modeled 
more accurately in the continuous domain. Specifically, the 
Gaussian distribution and its variations such as the Gaussian 
Bernoulli-Gaussian (GBG) and the Gaussian-Erasure (GE) 
distributions are used for this purpose, particularly when 
evaluating theoretical bounds l7l-||9l. 

In this paper, we first show that a "single" BSC cannot 
accurately model the correlation between continuous-valued 
sources, and we propose a new correlation model that ex- 
ploits "multiple" BSCs for this purpose. The number of these 
channels is equal to the number of bits used in the binary 
representation of one sample. Each channel models the bits 
with the same significance, i.e., from the most significant bit 
(MSB) to the least significant bit (LSB), which is denoted as 
a bit-plane ifTOl . 

We next focus on the implementation of the new model in 
the LDPC-based compression of continuous-valued sources. 
We modify the existing decoding algorithm for this specific 
model extracted from continuous-valued input sources and 
investigate its impact on the coding efficiency. Further, by 
using an interleaver before feeding data into the Slepian- 
Wolf encoder, the successive bits belonging to one sample are 
shuffled to introduce randomness to the errors in the binary 
domain. Numerical results, both in the binary and continuous 
domains, demonstrate the efficiency of the proposed scheme. 

The rest of the paper is organized as follows. The existing 
correlation models are discussed in Section [TT] In Section [Til] 
we introduce a new correlation model for continuous-valued 
sources. Section[IV]is devoted to integration of the new model 
to the LDPC-based Slepian-Wolf coding. Simulation results 
are presented in Section [V] This is followed by conclusions 
in Section |VI] 

II. Existing Correlation Models 

Lossless compression of correlated sources (Slepian-Wolf 
coding) is performed through the use of channel codes where 
one source is considered as a noisy version of the other one. 
This requires knowing the correlation between the sources at 
the decoder. 



A. Correlation Between Binary Sources 

The correlation and virtual communication channel between 
the binary sequences x and y are the same ifTTl and are usually 
modeled by a BSC with crossover probability p. The parameter 
of this channel is defined by 0}. Equivalently, one can obtain 
p by averaging the Hamming weight of x © y for a long run 
of input data and side information, i.e., 



X 



p = lim -w H (x n © y n ). 

n— >-oo fi 



(2) 



Then, using binary channel coding, near-lossless compression 
with a vanishing probability of error can be achieved provided 
that the length of the channel code goes to infinity [1], [12]. 

B. Correlation Between Analog Sources 

In general, the correlation between the two analog sources 
X and Y can be defined by 



Y = X + E, 



(3) 



where E is a real-valued random variable. Specifically, for the 
Gaussian sources we usually have 



\Ar(0,a 2 e ) w.p. qi , 

E -^(0,0-1 + 0-?) w.p. q 2 , 

w.p. l-qi-q 2 , 



(4) 



in which of 3> a 2 and qx + q% < 1. This model contains 
several well-known models which are suited for video coding 
and sensor networks. For example, for qi — 1 or q 2 = 1 the 
Gaussian correlation is obtained, which is broadly used in the 
literature when X and Y are Gaussian. Further, for qi+q 2 = 1 
the GBG and for qi + q 2 < 1, qiq 2 = the GE models are 
realized. The latter two models are more suitable for video 
applications (8). These models are also used for evaluating 
theoretical bounds and performance limits [7], [8]. 

Although the correlation between continuous-valued sources 
can be modeled more accurately in the continuous domain, 
practically it is usually modeled in the binary domain. This 
is due to the fact that, even for continuous-valued sources, 
compression is mostly done through the use of binary channel 
codesQ To do so, the two sources are quantized and their 
correlation is modeled by a virtual BSC in the binary domain, 
as shown in Fig. |l(a)| In the next section, however, we show 
that this assumption is not very accurate, and we propose an 
alternative, more accurate model. 

III. A New Correlation Channel Model 

A. Evaluating the Single BSC Model 

Let X and Y be two continuous-valued sources. When using 
binary channel codes for compression, X and Y need to be 
quantized before compression. 1 Then, as shown in Fig. |l(a)| 
the correlation between x and y (the binary representation of 
X and Y) is defined in the binary domain by means of a BSC. 

'it is possible to do compression before quantization; this requires real- 
number channel codes and brings about a different paradigm for DSC |9 |. 
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Fig. 1. Virtual correlation channel models for continuous-valued sources 
(X and Y) in the binary domain [(a)! Current model, [(b)] New model for 6-bit 
scalar quantizer, x 1 to x b are b subsequences of x that contain data belonging 
to the different bit-planes. 
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Fig. 2. Crossover probabilities of different BSCs, each corresponding to 
one bit-plane, at different channel-error-to-quantization-noise ratio {a^/o^). 
X ~ Af(0, 1) and Y is defined by £3), 0} where q x = 1/5 and q 2 = 0. 
Quantization is done using a 6-bit scalar uniform quantizer. 



We observe that this model is not very accurate. This is 
because the bits resulting from quantization of a sample and 
its corresponding side information are not independent. For 
example, if Xi (a sample of X) and its counterpart Yi are 
the same, then all bits resulted from those samples will be 
identical. That is, the correlation between these bits cannot 
be modeled independently. A more quantitative example is 
obtained by considering the model in © and with q± = 1. 
Hence, E - N{Q,a 2 e ) and Pr(|£| > 2a e ) < 5%. Now 
if Ue = A/ 2, where A is the quantization step size, we 
will have Prd^ > A) < 5%. This means that in y (the 
binary representation of Y), most probably only the first two 
lower significant bits will be affected. In other words, higher 
significant bits of x and y are similar with high probability. 
Numerical results in Fig. [2] verifies this observation. 

The above discussion indicates that at low channel-error- 
to-quantization noise ratios (a 2 /a 2 , a 2 = A 2 /12) the higher 
significant bits of x © y (error in the binary domain) are 0, 
with high probability. Therefore, correlation parameters differ 
depending on the bit position (bit-plane); i.e., an independent 
error in the sample (continuous) domain cannot be translated 



to an i.i.d. error in the binary domain. Conversely, a bitwise 
correlation with a same parameter for all bit positions is not 
suited for continuous-valued sources. 

In the remaining of this paper, a novel approach is proposed 
to deal with this problem. The key is to find a way to effec- 
tively model and implement the aforementioned dependency. 

B. Proposed Model 

It is clear that the bits generated from different samples of 
a source (say Xi and Xj) are independent as long as these 
samples are generated independently. Also, considering the 
correlation in continuous domain, it can be seen that the same 
argument is valid for the binary representation of X and Y. 
That is, Xi and yj are independent if they are generated from 
different samples. This is because Xi is related to Yj, (through 
Ej) but it is independent from Yj for any j ^ i. 

This indicates that, using a 6-bit quantizer, b BSCs are 
enough to efficiently model the correlation between the two 
correlated continuous-valued sources; each of these channels 
is used to model the correlation between bits corresponding to 
one bit-plane. For one thing, BSC(pb) is used to model the cor- 
relation between the MSB's of X and Y in the binary domain. 
This is shown in Fig. |l(b)| Numerical results, presented in 
Fig. 12 confirm that these channels have different parameters. 
Moreover, with high probability, at low and moderate channel 
noises we have 



Pi > P2 > ■ ■ ■ > Pb, 



(5) 



where the indices 1 to b, respectively, represent the channel 
corresponding to the LSB to MSB. This is intuitively appealing 
because even a small error in continuous domain (Ei) can 
invert the LSB while the MSB is affected only with large 
errors. Note that the parameter of the conventional single BSC 
model is obtained by 



1 b 



Pk- 



(6) 



fc=i 



We next discuss the incorporation of this new model into the 
DSC framework that uses LDPC codes for compression. 

IV. Decoding Using LDPC Codes 

In this section, we present three different implementations 
of the introduced correlation model in the Slepian-Wolf coding 
based on LDPC codes. These are named parallel, sequential, 
and hybrid decoding. 

A. Parallel Decoding 

A first idea is to divide the input sequence into b sub- 
streams each of which contains only the bits with the same 
significance. Now each channel can be modeled by one BSC 
with its own parameter. Hence, we can implement b parallel 
LDPC decoders each corresponding to one correlation channel. 
This implies b LDPC decoders at the decoding center, which 
increases the complexity. Particularly, effective compression 
requires codes with different rates, as the parameter of BSC 
channel for different bit-planes is different. Then, the code 



corresponding to the MSB, for example, will have the highest 
rate, as it has the smallest p. On the other hand, given a 
same code for all channels the MSB will be decoded with 
the lowest BER. Given a same LDPC code for all channels, 
the complexity increases b times, in the new approach; the 
delay is the same assuming that the input of all decoders are 
available at the reciever. 

B. Sequential Decoding 

By using sequential decoding, the number of decoders can 
be reduced to one at the cost of increased delay. To do so, 
we let the decoder decode different sub-streams sequentially. 
Note that each time the LDPC decoder is initialized with 
the corresponding pk- It can be seen that, compared to the 
parallel decoding, the complexity reduces b times while the 
delay increases b times. The latter is due to the fact that in 
order for decoder to reconstruct one sample of X, it must wait 
for the output of b LDPC blocks. 

C. Hybrid Decoding 

A yet more efficient integration of the new correlation model 
into the LDPC -based DSC can be achieved just by using a 
single LDPC encoder/decoder. This is done in two steps, as 
explained in the following. 

1 ) Manipulating the LLRs: The parameters of the multiple- 
BSC correlation model can be incorporated into the LDPC- 
based DSC by judiciously setting the LLR sent from (to) the 
variable nodes. The idea is to take into account the bit-plane 
to which each bit belongs. This requires a slight change in 
the standard LDPC decoding algorithm. Specifically, using the 
notation in [ 1 ], we just need to adjust the LLR sent from (to) 
the variable nodes. That is, equation (1) in JTJ will be modified 
as 



qi,o = log 



Pr[xj = %] 



(1 - 2yi) lof 



1 



■PkW 



(7) 



in which i = 1, . . . , n, Pk[i] G {Pi, ■ ■ • ,Pb}< an d k represents 
the bit-plane to which y^ (or Xi) belongs. This is illustrated 
in Fig. [3] For example, if Xi is the LSB, in its corresponding 
sample, then k = 1. Note that if b\n, where n is the code 
length, then k = (i mod b). 

Since the initial LLR's become more accurate in this 
method, the number of iterations required to achieve a same 
performance reduces. However, the performance gap is still 
noticeable. To bridge this gap, we propose to interleave the 
input data (and side information) in the binary domain. 

2) Interleaving: As we discussed in Section [TTTJ the bits 
corresponding to each error sample, which are located in a row, 
are correlated. By interleaving x and y before feeding them 
into the Slepian-Wolf encoder and decoder, these successive 
bits can be shuffled to introduce randomness to the errors. 
Then, it makes better sense to encode data belonging to all bit- 
planes altogether as in the conventional approach. The longer 
the permutation block input, the more accurate the model and 
the better the performance. Interleaving, however, can increase 
the delay at the receiver side since we need deinterleaving 
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Fig. 3. Variable nodes and their corresponding p in the hybrid LDPC-based decoding for the block length n = 10 4 and 6 = 6. 



after the Slepian-Wolf decoder. To avoid excessive delay, we 
set the length of interleaving block equal to the length of the 
LDPC code. The improvement in the BER and MSE, only 
due to interleaving, is remarkably high. Obviously, we can 
use interleaving and LLR's manipulation simultaneously; this 
requires applying interleaving to the crossover probabilities, 
depicted in Fig. [3] as well. 

Another important advantage of this approach is that it can 
be used to combat the bursty correlation channels, as a perfect 
interleaver transforms a bursty channel into an independently 
distributed channel. The bursty correlation channel model is 
capable of addressing the bursty nature of the correlation 
between sources in applications such as sensor networks and 
video coding, since it takes the memory of the correlation into 
account lfl3l . 

V. Simulation Results and Performance 
Evaluation 

In this section, we numerically compare the new decoding 
algorithm with the conventional approach which considers just 
one BSC for the correlation model. We use irregular LDPC 
code of rate 1/2 with the degree distribution (T) 

A(x) = 0.234029a; + 0.212425a: 2 + 0.146898a; 5 

+ 0.102840a; 6 + 0.303808a; 19 , 
p(x) = 0.71875a; 7 + 0.28125a; 8 . 

The frame length is 10 4 and the bit error rate (BER) and 
corresponding mean-squared error (MSE) are measured after 
50 itinerations in both schemes. The source X is a zero mean, 
unit variance Gaussian. Also the correlation between X and 
Y is defined by GE channel with q\ = 1/5, qi = in ©, 
and channel-error-to-quantization-noise ratio (cr 2 /<7 2 ) varies as 
shown in Fig. |4(b)| Both sources are quantized with a 6-bit 
scalar uniform quantizer. 

Simulation results are presented in Fig. |4(a)| -Fig. |4(c)| In 
these figures, the "actual data" refers to the case where binary 
sequences x and y are obtained from quantizing X and Y. We 
also compute the BER for the case that side information y is 
generated by passing x through a virtual BSC with parameter 
p, which is conventional in practical Slepian-Wolf coding 0]- 
[|5| . This is labeled as "artificial data." The fact that "actual" 
and "artificial" side information result in very different BERs, 
by itself, indicates that a single BSC is not an appropriate 



model for correlation between continuous-valued sources. On 
the contrary, the BER resulted from hybrid decoding with 
actual side information is significantly better than that of the 
conventional approach which shows the suitability of the new 
model. Figure [4(b)| represents the corresponding MSE. From 
these figures, it can be seen that the new scheme (hybrid 
decoding) greatly outperforms the existing method, for actual 
data. Furthermore, as shown is Fig. |4(c)| the number of 
iterations required to achieve such a performance is much 
smaller than the existing method, owing to more accurate 
initial LLRs. 

The performance of parallel and sequential decoding, for 
a same code, are the same. These schemes benefit from the 
advantage of working over data belonging to separate bit- 
planes. Hence, one BSC can effectively approximate the cor- 
responding correlation for each bit-plane. Simulation results 
verify that separate compression of data belonging to different 
bit-planes that uses actual data is as effective as the case that 
uses artificial side information. Moreover, there is no need 
for interleaving. However, an efficient compression, in parallel 
and sequential decoding, requires codes with different rates for 
each bit-plane. Alternatively, this can be implemented through 
the use of rate-adaptive LDPC codes [14|. 

VI. Conclusions 

We have introduced an improved model for the virtual cor- 
relation between the continuous-valued sources in the binary 
domain. This model exploits multiple BSCs rather than the 
conventional single-BSC model so that it can deal with the 
dependency among the bits resulting from quantization of each 
error sample by converting the error sequence into multiple 
i.i.d. sequences. An efficient implementation of the new model 
is realized just by using a single LDPC decoder but judiciously 
setting the LLR sent from (to) the variable nodes. The number 
of iterations required to achieve the same performance reduces 
noticeably as a result of this prudent setting of initial LLRs. 
Besides, by interleaving the data and side information the bits 
belonging to one error sample are shuffled which increases the 
performance of the decoding to a great extent. This significant 
improvement in the BER and MSE is achieved without any 
increase in the complexity or delay. The new scheme can also 
be used to combat the bursty nature of the correlation channel 
in practical applications. 
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Fig. 4. Performance evaluation for irregular rate 1/2 LDPC codes of length n = 10 4 for maximum iterations of 50. SD and HD, respectively, refer to 
"standard decoding" [1] which is based on single BSC and "hybrid decoding" (proposed in this paper) based on multiple BSCs. "Actual data" is generated 
by quantizing real-valued X and Y to x and y, whereas in "artificial data" y is generated artificially by passing x through a BSC(p) which is the common 
approach in the literature. |(a)|Th e BER p erform ance. |(b)| The end to end distortion (MSE). [(c)] Average number of iterations used to achieve the BER and the 
corresponding MSE in Fig. |4(a)| and Fig. |4(b)| 
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